Title: Introduction to Soccer Pass Network Analysis with Python

Author: Indranil Ghosh

Institute: School of Fundamental Sciences, Massey University

Twitter: @indraghosh314

Website: https://indrag49.github.io/

Date: 31-07-2021


pyohio_logo.png

massey%20logo.png

Abstract¶

This talk teaches four simple concepts to those who want to start working on football data analysis:

  • How to get open access event data from statsbomb using statsbombpy,

  • How to draw a soccer pitch using mplsoccer,

  • How to visualize a pass network for a particular team in a particular match, and

  • How to use NetworkX module to analyze the pass network.

Start with statsbombpy¶

  • Use pip to install statsbombpy by using the following command:
In [ ]:
pip install statsbombpy

The open data from Statsbomb can be accessed without any need of authentication from the user but it is always advised to go through the Terms & Conditions section stated at their documentation page.

  • Now we will go step by step to understand how to extract the relevant data. Before that, we need to import the statsbombpy package.
In [2]:
from statsbombpy import sb
  • We then import the numpy and the pandas packages that help us manipulate our datasets and perform analyses like data cleaning and data extraction.
In [3]:
import numpy as np
import pandas as pd
  • To get access to the Competitions dataset type the following:
In [4]:
comp = sb.competitions()
credentials were not supplied. open data access only
  • The dataset comp look like this:
In [5]:
comp.head(15)
Out[5]:
competition_id season_id country_name competition_name competition_gender season_name match_updated match_available
0 16 4 Europe Champions League male 2018/2019 2021-04-19T17:36:05.724116 2021-04-19T17:36:05.724116
1 16 1 Europe Champions League male 2017/2018 2021-01-23T21:55:30.425330 2021-01-23T21:55:30.425330
2 16 2 Europe Champions League male 2016/2017 2020-08-26T12:33:15.869622 2020-07-29T05:00
3 16 27 Europe Champions League male 2015/2016 2020-08-26T12:33:15.869622 2020-07-29T05:00
4 16 26 Europe Champions League male 2014/2015 2020-08-26T12:33:15.869622 2020-07-29T05:00
5 16 25 Europe Champions League male 2013/2014 2020-08-26T12:33:15.869622 2020-07-29T05:00
6 16 24 Europe Champions League male 2012/2013 2020-08-26T12:33:15.869622 2020-07-29T05:00
7 16 23 Europe Champions League male 2011/2012 2020-08-26T12:33:15.869622 2020-07-29T05:00
8 16 22 Europe Champions League male 2010/2011 2020-07-29T05:00 2020-07-29T05:00
9 16 21 Europe Champions League male 2009/2010 2020-07-29T05:00 2020-07-29T05:00
10 16 41 Europe Champions League male 2008/2009 2020-08-30T10:18:39.435424 2020-08-30T10:18:39.435424
11 16 39 Europe Champions League male 2006/2007 2021-03-31T04:18:30.437060 2021-03-31T04:18:30.437060
12 16 37 Europe Champions League male 2004/2005 2021-04-01T06:18:57.459032 2021-04-01T06:18:57.459032
13 16 44 Europe Champions League male 2003/2004 2021-04-01T00:34:59.472485 2021-04-01T00:34:59.472485
14 16 76 Europe Champions League male 1999/2000 2020-07-29T05:00 2020-07-29T05:00
  • We can extract the column names of comp to understand the dataset better and draw out relevant information from the same. Type the following:
In [6]:
print(comp.columns)
Index(['competition_id', 'season_id', 'country_name', 'competition_name',
       'competition_gender', 'season_name', 'match_updated',
       'match_available'],
      dtype='object')
  • Let us make sense of a particular row from the comp dataset. For example, if we look into the row where the competition_id is 16and the season_id is 1, we notice that the country_name is Europe, the competition_name is Champions League, the season_name is 2017/2018, and so on. Suppose we are satisfied with the above information, and we want to analyze a game from 1017/18's Champions League season. We keep note of the competition_id and season_id at that row, which are 16 and 1 respectively. Now we extract out the matches dataset by typing the following:
In [7]:
mat = sb.matches(competition_id = 16, season_id = 1)
credentials were not supplied. open data access only
  • The dataset mat looks like this:
In [8]:
mat
Out[8]:
match_id match_date kick_off competition season home_team away_team home_score away_score match_status match_status_360 last_updated last_updated_360 match_week competition_stage stadium referee data_version shot_fidelity_version xy_fidelity_version
0 18245 2018-05-26 20:45:00.000 Europe - Champions League 2017/2018 Real Madrid Liverpool 3 1 available unscheduled 2021-01-23T21:55:30.425330 None 7 Final NSK Olimpijs'kyj M. Mažić 1.1.0 2 2
  • Evidently, the mat dataset gives us the match ids, the match dates, the kick off times, the home and away teams, the scores in a particular match, the name of the referee who officiated the match and so on. Here match_id is the unique id that will help us draw out event data for a particular match from 2017/18's Champion's League season. Let us get the event data from a match. We see there is only one match available, with match_id = 18245, which was the Champions League final match between Real Madrid and Liverpool ⚽ that took place at the Olimpiyskiy National Sports Complex, Moscow stadium and it ended up 3-1 in Real Madrid's favor 👀 👀 👀 👀. A great feat to be honest! Let us obtain the event data for this match.
In [9]:
events = sb.events(match_id = 18245)
credentials were not supplied. open data access only
  • The dataset events fetching us the event data for the particular match looks like this:
In [10]:
events
Out[10]:
50_50 ball_receipt_outcome ball_recovery_recovery_failure block_offensive carry_end_location clearance_aerial_won clearance_body_part clearance_head clearance_left_foot clearance_right_foot ... shot_statsbomb_xg shot_technique shot_type substitution_outcome substitution_replacement tactics team timestamp type under_pressure
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN {'formation': 41212, 'lineup': [{'player': {'i... Real Madrid 00:00:00.000 Starting XI NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN {'formation': 433, 'lineup': [{'player': {'id'... Liverpool 00:00:00.000 Starting XI NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Real Madrid 00:00:00.000 Half Start NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Liverpool 00:00:00.000 Half Start NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Liverpool 00:00:00.000 Half Start NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3492 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Real Madrid 00:42:21.211 Offside NaN
3493 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Real Madrid 00:48:31.725 Half End NaN
3494 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Liverpool 00:48:31.725 Half End NaN
3495 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Liverpool 00:48:02.893 Half End NaN
3496 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN Real Madrid 00:48:02.893 Half End NaN

3497 rows × 86 columns

  • We see that we were able to get access to all the events from the Real Madrid vs. Liverpool match. We can jot down the column names to get a clearer overview of what kinds of events to expect from the match.
In [11]:
print(events.columns)
Index(['50_50', 'ball_receipt_outcome', 'ball_recovery_recovery_failure',
       'block_offensive', 'carry_end_location', 'clearance_aerial_won',
       'clearance_body_part', 'clearance_head', 'clearance_left_foot',
       'clearance_right_foot', 'counterpress', 'dribble_nutmeg',
       'dribble_outcome', 'dribble_overrun', 'duel_outcome', 'duel_type',
       'duration', 'foul_committed_advantage', 'foul_committed_card',
       'foul_committed_type', 'foul_won_advantage', 'foul_won_defensive',
       'goalkeeper_body_part', 'goalkeeper_end_location', 'goalkeeper_outcome',
       'goalkeeper_position', 'goalkeeper_punched_out', 'goalkeeper_technique',
       'goalkeeper_type', 'id', 'index', 'injury_stoppage_in_chain',
       'interception_outcome', 'location', 'match_id', 'minute', 'off_camera',
       'out', 'pass_aerial_won', 'pass_angle', 'pass_assisted_shot_id',
       'pass_body_part', 'pass_cross', 'pass_cut_back', 'pass_end_location',
       'pass_goal_assist', 'pass_height', 'pass_inswinging', 'pass_length',
       'pass_miscommunication', 'pass_outcome', 'pass_outswinging',
       'pass_recipient', 'pass_shot_assist', 'pass_straight', 'pass_switch',
       'pass_technique', 'pass_through_ball', 'pass_type', 'period',
       'play_pattern', 'player', 'position', 'possession', 'possession_team',
       'related_events', 'second', 'shot_aerial_won', 'shot_body_part',
       'shot_end_location', 'shot_first_time', 'shot_freeze_frame',
       'shot_key_pass_id', 'shot_one_on_one', 'shot_outcome', 'shot_redirect',
       'shot_statsbomb_xg', 'shot_technique', 'shot_type',
       'substitution_outcome', 'substitution_replacement', 'tactics', 'team',
       'timestamp', 'type', 'under_pressure'],
      dtype='object')
  • This completes our section on how to get access to open event data for a particular football match. We need to filter out only those events on which we want to perform advanced mathematical analyses and build conclusions. Next, we will learn how to visualize a football pitch using mplsoccer.

Draw a Football Pitch¶

  • If you do not want to recreate a football pitch manually using Python (which would be rather tedious) you can simply use the mplsoccer module without any concern. To my knowledge it provides with the best functionalities to draw a football pitch. This package is maintained by Anmol Durgapal and Andrew Rowlinson.

  • Keep in mind you can do a lot more advanced visualization stuffs using mplsoccer besides drawing a football pitch. We will encounter them as we move forward with other posts later. For now let us focus on visualizing a pitch in the simplest way possible. We need to pip install the package first:

In [ ]:
pip install mplsoccer
  • Note that mplsoccer uses Python 3.6+. Next we need to import matplotlib and the Pitch classes.
In [12]:
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
  • Let us try to draw the simplest football pitch that satisfies our visualization needs.
In [13]:
pitch = Pitch(pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
        tight_layout = False, goal_type = 'box', label = True,  axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
  • Let us try to understand what is happening here. Personally, I like setting the pitch_color argument to 'grass' giving an impression of a real life football pitch. Note that any other color can be set, for example, 'black' or any color represented by its hex code. Discarding the stripe argument removes the darker stripes that appear on the pitch. The line_color is self-explanatory and the user can change its color too according to their need. By default, the axis, labels and the ticks representing the scales are switched off. The user can turn it on by setting label, axis and tick arguments to be True, as evident in the above pitch. Let us draw a different pitch with its color changed and stripes removed.
In [14]:
pitch = Pitch(pitch_color='black', line_color = 'white', constrained_layout = True,
        tight_layout = False, goal_type = 'box', label = True,  axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
  • Now let us focus on the axis range for a moment. By default the Pitch() function sets the pitch type to be statsbomb where the y-axis is inverted and ranges from 80 to 0. The x-axis ranges from 0 to 120. We will be mostly working with statsbomb data, so, these orientations of the axes won't be of much concern. Nevertheless this information is way too useful and we must keep this in mind, in case we deal with football data from other sources.

  • To be precise, there are eight different pitch types that mplsoccer provides us with. They are 'statsbomb', 'opta', 'tracab', 'skillcorner', 'wyscout','metricasports', 'uefa', and 'custom'. This can be set using the pitch_type argument inside the Pitch() function. Let us check the orientation of the uefa pitch type:

In [15]:
pitch = Pitch(pitch_color='grass', stripe = True, pitch_type = 'uefa', line_color = 'white', constrained_layout = True,
        tight_layout = False, goal_type = 'box', label = True,  axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
  • The reader might have noticed that by default, the pitch has a horizontal appearance. If the user wants it to be vertical, they should pass an additional argument orientation and set it to 'vertical'.
In [16]:
pitch = Pitch(orientation = 'vertical', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
        tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
  • The user can also make the pitch appear half by setting the view argument to be 'half'.
In [17]:
pitch = Pitch(view = 'half', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
        tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
  • These are the most basic concepts covering the topic of drawing and visualizing a football pitch using mplsoccer. The pitches can be further customized to meet the users' visualization needs. Keep an eye on the mplsoccer documentation to learn more about the same. In the next section, we will learn how to visualize a pass network for a particular team from a match and analyze the network with the help of NetworkX Python package. This package will help us use basic concepts from complex network analysis literature to analyze the network and deduce some interesting properties from the same.

Visualize a Pass Network¶

  • We will employ the NetworkX Python package for the analysis purpose. Let us pip install the package:
In [ ]:
pip install networkx
  • After installing the package we will import networkx:
In [18]:
import networkx as nx
  • We will also pip install the seaborn package which is a Python package built on matplotlib and is used for generating informative and appealing statistical graphs for analysis purposes.
In [ ]:
pip install seaborn
  • Let's import seaborn too
In [19]:
import seaborn as sns
  • If we look into the events dataset, we notice that there is a column named tactics that provides us with team lineups, formations, player ids and their jersey number from both the teams. The corresponding row values for column type gives us an idea about whether it was the starting 11 formation or was a tactical shift or any other developments in the teams. Let us generate a completely new dataset only focusing on the tactics and the type columns. We will filter the data in such a way that the tactics column has no rows set to nan.
In [20]:
tact = events[events['tactics'].isnull() == False]
tact = tact[['tactics', 'team', 'type']]
  • The tact dataset looks like:
In [21]:
tact
Out[21]:
tactics team type
0 {'formation': 41212, 'lineup': [{'player': {'i... Real Madrid Starting XI
1 {'formation': 433, 'lineup': [{'player': {'id'... Liverpool Starting XI
3489 {'formation': 433, 'lineup': [{'player': {'id'... Liverpool Tactical Shift
3490 {'formation': 433, 'lineup': [{'player': {'id'... Real Madrid Tactical Shift
3491 {'formation': 433, 'lineup': [{'player': {'id'... Real Madrid Tactical Shift
  • Let us focus only on the tactics for the starting 11 set up from both the teams. We will build and analyze the pass network generated from among the starting 11 players from either of the teams. If we look into the first two rows of the type column in tact, we see that they are set as 'Starting XI', one for each team. Let us separately fetch the data for the teams, filtering by type
In [22]:
tact = tact[tact['type'] == 'Starting XI']
tact_Real = tact[tact['team'] == 'Real Madrid']
tact_Liv = tact[tact['team'] == 'Liverpool']
tact_Real = tact_Real['tactics']
tact_Liv = tact_Liv['tactics']
  • So both tact_Real and tact_Liv are dataframes made of single rows with their indices (Which we will use to extract the data), and the tactics column is made up of a Python dict object. For now we are only interested in the key 'lineup' to get all the information about the players from the teams.
In [23]:
dict_Real = tact_Real[0]['lineup']
dict_Liv = tact_Liv[1]['lineup']
  • We will use the from_dict() function provided by pandas to convert the dictionary into a dataframe.
In [24]:
lineup_Real = pd.DataFrame.from_dict(dict_Real)
lineup_Real
Out[24]:
player position jersey_number
0 {'id': 5597, 'name': 'Keylor Navas Gamboa'} {'id': 1, 'name': 'Goalkeeper'} 1
1 {'id': 5721, 'name': 'Daniel Carvajal Ramos'} {'id': 2, 'name': 'Right Back'} 2
2 {'id': 5485, 'name': 'Raphaël Varane'} {'id': 3, 'name': 'Right Center Back'} 5
3 {'id': 5201, 'name': 'Sergio Ramos García'} {'id': 5, 'name': 'Left Center Back'} 4
4 {'id': 5552, 'name': 'Marcelo Vieira da Silva ... {'id': 6, 'name': 'Left Back'} 12
5 {'id': 5539, 'name': 'Carlos Henrique Casimiro'} {'id': 10, 'name': 'Center Defensive Midfield'} 14
6 {'id': 5463, 'name': 'Luka Modrić'} {'id': 13, 'name': 'Right Center Midfield'} 10
7 {'id': 5574, 'name': 'Toni Kroos'} {'id': 15, 'name': 'Left Center Midfield'} 8
8 {'id': 4926, 'name': 'Francisco Román Alarcón ... {'id': 19, 'name': 'Center Attacking Midfield'} 22
9 {'id': 19677, 'name': 'Karim Benzema'} {'id': 22, 'name': 'Right Center Forward'} 9
10 {'id': 5207, 'name': 'Cristiano Ronaldo dos Sa... {'id': 24, 'name': 'Left Center Forward'} 7
In [25]:
lineup_Liv = pd.DataFrame.from_dict(dict_Liv)
lineup_Liv
Out[25]:
player position jersey_number
0 {'id': 3630, 'name': 'Loris Karius'} {'id': 1, 'name': 'Goalkeeper'} 1
1 {'id': 3664, 'name': 'Trent Alexander-Arnold'} {'id': 2, 'name': 'Right Back'} 66
2 {'id': 3471, 'name': 'Dejan Lovren'} {'id': 3, 'name': 'Right Center Back'} 6
3 {'id': 3669, 'name': 'Virgil van Dijk'} {'id': 5, 'name': 'Left Center Back'} 4
4 {'id': 3655, 'name': 'Andrew Robertson'} {'id': 6, 'name': 'Left Back'} 26
5 {'id': 3532, 'name': 'Jordan Brian Henderson'} {'id': 10, 'name': 'Center Defensive Midfield'} 14
6 {'id': 3567, 'name': 'Georginio Wijnaldum'} {'id': 13, 'name': 'Right Center Midfield'} 5
7 {'id': 3473, 'name': 'James Philip Milner'} {'id': 15, 'name': 'Left Center Midfield'} 7
8 {'id': 3531, 'name': 'Mohamed Salah'} {'id': 17, 'name': 'Right Wing'} 11
9 {'id': 3629, 'name': 'Sadio Mané'} {'id': 21, 'name': 'Left Wing'} 19
10 {'id': 3535, 'name': 'Roberto Firmino Barbosa ... {'id': 23, 'name': 'Center Forward'} 9
  • We are basically interested in the players name and their corresponding jersey numbers. We will use a simple for loop and store the information in seperate dictionaries for both the teams.
In [26]:
players_Real = {}
for i in range(len(lineup_Real)):
    key = lineup_Real.player[i]['name']
    val = lineup_Real.jersey_number[i]
    players_Real[key] = str(val)
print(players_Real)
{'Keylor Navas Gamboa': '1', 'Daniel Carvajal Ramos': '2', 'Raphaël Varane': '5', 'Sergio Ramos García': '4', 'Marcelo Vieira da Silva Júnior': '12', 'Carlos Henrique Casimiro': '14', 'Luka Modrić': '10', 'Toni Kroos': '8', 'Francisco Román Alarcón Suárez': '22', 'Karim Benzema': '9', 'Cristiano Ronaldo dos Santos Aveiro': '7'}
In [27]:
players_Liv = {}
for i in range(len(lineup_Liv)):
    key = lineup_Liv.player[i]['name']
    val = lineup_Liv.jersey_number[i]
    players_Liv[key] = str(val)
print(players_Liv)
{'Loris Karius': '1', 'Trent Alexander-Arnold': '66', 'Dejan Lovren': '6', 'Virgil van Dijk': '4', 'Andrew Robertson': '26', 'Jordan Brian Henderson': '14', 'Georginio Wijnaldum': '5', 'James Philip Milner': '7', 'Mohamed Salah': '11', 'Sadio Mané': '19', 'Roberto Firmino Barbosa de Oliveira': '9'}
  • So, we have collected the names and the jersey number of the players (starting 11) from both the teams in separate dictionaries named players_Real and players_Liv. These will come handy later!

  • Now from the events dataset we will extract out the relevant columns for our pass network analysis purposes.

In [28]:
events_pn = events[['minute', 'second', 'team', 'type', 'location', 'pass_end_location', 'pass_outcome', 'player']]
  • The first 10 rows of the events_pn dataframe:
In [29]:
events_pn.head(10)
Out[29]:
minute second team type location pass_end_location pass_outcome player
0 0 0 Real Madrid Starting XI NaN NaN NaN NaN
1 0 0 Liverpool Starting XI NaN NaN NaN NaN
2 0 0 Real Madrid Half Start NaN NaN NaN NaN
3 0 0 Liverpool Half Start NaN NaN NaN NaN
4 45 0 Liverpool Half Start NaN NaN NaN NaN
5 45 0 Real Madrid Half Start NaN NaN NaN NaN
6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner
7 0 3 Liverpool Pass [35.0, 40.8] [92.7, 22.7] Incomplete Dejan Lovren
8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane
9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić
  • The last 10 rows of the events_pn dataframe:
In [30]:
events_pn.tail(10)
Out[30]:
minute second team type location pass_end_location pass_outcome player
3487 82 27 Liverpool Substitution NaN NaN NaN James Philip Milner
3488 88 21 Real Madrid Substitution NaN NaN NaN Karim Benzema
3489 31 41 Liverpool Tactical Shift NaN NaN NaN NaN
3490 61 1 Real Madrid Tactical Shift NaN NaN NaN NaN
3491 88 34 Real Madrid Tactical Shift NaN NaN NaN NaN
3492 42 21 Real Madrid Offside [114.8, 41.4] NaN NaN Karim Benzema
3493 48 31 Real Madrid Half End NaN NaN NaN NaN
3494 48 31 Liverpool Half End NaN NaN NaN NaN
3495 93 2 Liverpool Half End NaN NaN NaN NaN
3496 93 2 Real Madrid Half End NaN NaN NaN NaN
  • The next step is to filter the datset by teams and store them as new datasets:
In [31]:
events_Real = events_pn[events_pn['team'] == 'Real Madrid']
events_Liv = events_pn[events_pn['team'] == 'Liverpool']

View the first 10 rows from both the datasets:

In [32]:
events_Real.head(10)
Out[32]:
minute second team type location pass_end_location pass_outcome player
0 0 0 Real Madrid Starting XI NaN NaN NaN NaN
2 0 0 Real Madrid Half Start NaN NaN NaN NaN
5 45 0 Real Madrid Half Start NaN NaN NaN NaN
8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane
9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić
10 0 11 Real Madrid Pass [22.3, 76.6] [33.4, 68.0] NaN Daniel Carvajal Ramos
11 0 15 Real Madrid Pass [36.2, 75.3] [43.6, 62.0] Incomplete Carlos Henrique Casimiro
16 0 25 Real Madrid Pass [14.7, 23.2] [56.7, 6.2] Incomplete Sergio Ramos García
17 0 40 Real Madrid Pass [57.5, 4.6] [49.2, 15.6] NaN Marcelo Vieira da Silva Júnior
18 0 43 Real Madrid Pass [48.8, 18.4] [49.8, 12.5] NaN Carlos Henrique Casimiro
In [33]:
events_Liv.head(10)
Out[33]:
minute second team type location pass_end_location pass_outcome player
1 0 0 Liverpool Starting XI NaN NaN NaN NaN
3 0 0 Liverpool Half Start NaN NaN NaN NaN
4 45 0 Liverpool Half Start NaN NaN NaN NaN
6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner
7 0 3 Liverpool Pass [35.0, 40.8] [92.7, 22.7] Incomplete Dejan Lovren
12 0 16 Liverpool Pass [76.5, 18.1] [84.8, 9.5] NaN Jordan Brian Henderson
13 0 18 Liverpool Pass [84.4, 10.0] [92.5, 19.1] NaN Sadio Mané
14 0 19 Liverpool Pass [91.6, 21.3] [90.6, 50.7] NaN Roberto Firmino Barbosa de Oliveira
15 0 22 Liverpool Pass [92.2, 50.9] [109.7, 46.4] Incomplete Mohamed Salah
25 1 7 Liverpool Pass [42.0, 75.9] [115.6, 59.3] Incomplete Trent Alexander-Arnold
  • As we are only interested in the pass network generation, we will filter the datasets by keeping those rows where type is set to Pass.
In [34]:
events_pn_Real = events_Real[events_Real['type'] == 'Pass']
events_pn_Liv = events_Liv[events_Liv['type'] == 'Pass']
  • Again view the first 10 rows of the filtered datasets:
In [35]:
events_pn_Real.head(10)
Out[35]:
minute second team type location pass_end_location pass_outcome player
8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane
9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić
10 0 11 Real Madrid Pass [22.3, 76.6] [33.4, 68.0] NaN Daniel Carvajal Ramos
11 0 15 Real Madrid Pass [36.2, 75.3] [43.6, 62.0] Incomplete Carlos Henrique Casimiro
16 0 25 Real Madrid Pass [14.7, 23.2] [56.7, 6.2] Incomplete Sergio Ramos García
17 0 40 Real Madrid Pass [57.5, 4.6] [49.2, 15.6] NaN Marcelo Vieira da Silva Júnior
18 0 43 Real Madrid Pass [48.8, 18.4] [49.8, 12.5] NaN Carlos Henrique Casimiro
19 0 46 Real Madrid Pass [48.8, 13.9] [36.1, 56.3] NaN Toni Kroos
20 0 52 Real Madrid Pass [41.3, 54.8] [34.4, 40.2] NaN Raphaël Varane
21 0 55 Real Madrid Pass [39.1, 36.5] [65.4, 13.1] NaN Sergio Ramos García
In [36]:
events_pn_Liv.head(10)
Out[36]:
minute second team type location pass_end_location pass_outcome player
6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner
7 0 3 Liverpool Pass [35.0, 40.8] [92.7, 22.7] Incomplete Dejan Lovren
12 0 16 Liverpool Pass [76.5, 18.1] [84.8, 9.5] NaN Jordan Brian Henderson
13 0 18 Liverpool Pass [84.4, 10.0] [92.5, 19.1] NaN Sadio Mané
14 0 19 Liverpool Pass [91.6, 21.3] [90.6, 50.7] NaN Roberto Firmino Barbosa de Oliveira
15 0 22 Liverpool Pass [92.2, 50.9] [109.7, 46.4] Incomplete Mohamed Salah
25 1 7 Liverpool Pass [42.0, 75.9] [115.6, 59.3] Incomplete Trent Alexander-Arnold
37 2 0 Liverpool Pass [9.9, 39.1] [28.1, 4.2] NaN Virgil van Dijk
38 2 3 Liverpool Pass [43.2, 2.8] [50.1, 4.8] Incomplete Andrew Robertson
39 2 7 Liverpool Pass [53.2, 0.1] [50.0, 4.0] NaN Andrew Robertson
  • Let us now very carefully observe the datasets. Suppose from the events_pn_Real dataset, we are focusing on the second and the third row (index 1 and 2). Luka Modrić makes the pass at around 0th minute and 10th second (Second row) and Daniel Carvajal Ramos receives the pass at around 0th minute and 11th second (third row). So in both the datasets we need to add two extra columns named as pass_maker and pass_receiver, where pass_maker column would be similar to player column and the pass_receiver column would be the player column whose index would be shifted by one place in the negative direction. This can be achieved by the shift() function provided by pandas. We will perform this operation on both events_pn_Real and events_pn_Liv.
In [37]:
events_pn_Real['pass_maker'] = events_pn_Real['player']
events_pn_Real['pass_receiver'] = events_pn_Real['player'].shift(-1)

events_pn_Liv['pass_maker'] = events_pn_Liv['player']
events_pn_Liv['pass_receiver'] = events_pn_Liv['player'].shift(-1)
  • Let us check now how the modified datasets look:
In [38]:
events_pn_Real.head(10)
Out[38]:
minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane Raphaël Varane Luka Modrić
9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić Luka Modrić Daniel Carvajal Ramos
10 0 11 Real Madrid Pass [22.3, 76.6] [33.4, 68.0] NaN Daniel Carvajal Ramos Daniel Carvajal Ramos Carlos Henrique Casimiro
11 0 15 Real Madrid Pass [36.2, 75.3] [43.6, 62.0] Incomplete Carlos Henrique Casimiro Carlos Henrique Casimiro Sergio Ramos García
16 0 25 Real Madrid Pass [14.7, 23.2] [56.7, 6.2] Incomplete Sergio Ramos García Sergio Ramos García Marcelo Vieira da Silva Júnior
17 0 40 Real Madrid Pass [57.5, 4.6] [49.2, 15.6] NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Carlos Henrique Casimiro
18 0 43 Real Madrid Pass [48.8, 18.4] [49.8, 12.5] NaN Carlos Henrique Casimiro Carlos Henrique Casimiro Toni Kroos
19 0 46 Real Madrid Pass [48.8, 13.9] [36.1, 56.3] NaN Toni Kroos Toni Kroos Raphaël Varane
20 0 52 Real Madrid Pass [41.3, 54.8] [34.4, 40.2] NaN Raphaël Varane Raphaël Varane Sergio Ramos García
21 0 55 Real Madrid Pass [39.1, 36.5] [65.4, 13.1] NaN Sergio Ramos García Sergio Ramos García Cristiano Ronaldo dos Santos Aveiro
In [39]:
events_pn_Liv.head(10)
Out[39]:
minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner James Philip Milner Dejan Lovren
7 0 3 Liverpool Pass [35.0, 40.8] [92.7, 22.7] Incomplete Dejan Lovren Dejan Lovren Jordan Brian Henderson
12 0 16 Liverpool Pass [76.5, 18.1] [84.8, 9.5] NaN Jordan Brian Henderson Jordan Brian Henderson Sadio Mané
13 0 18 Liverpool Pass [84.4, 10.0] [92.5, 19.1] NaN Sadio Mané Sadio Mané Roberto Firmino Barbosa de Oliveira
14 0 19 Liverpool Pass [91.6, 21.3] [90.6, 50.7] NaN Roberto Firmino Barbosa de Oliveira Roberto Firmino Barbosa de Oliveira Mohamed Salah
15 0 22 Liverpool Pass [92.2, 50.9] [109.7, 46.4] Incomplete Mohamed Salah Mohamed Salah Trent Alexander-Arnold
25 1 7 Liverpool Pass [42.0, 75.9] [115.6, 59.3] Incomplete Trent Alexander-Arnold Trent Alexander-Arnold Virgil van Dijk
37 2 0 Liverpool Pass [9.9, 39.1] [28.1, 4.2] NaN Virgil van Dijk Virgil van Dijk Andrew Robertson
38 2 3 Liverpool Pass [43.2, 2.8] [50.1, 4.8] Incomplete Andrew Robertson Andrew Robertson Andrew Robertson
39 2 7 Liverpool Pass [53.2, 0.1] [50.0, 4.0] NaN Andrew Robertson Andrew Robertson James Philip Milner
  • Now, there might be passes which were not successful. Note that in the statsbomb data passes whose pass_outcome are set as nan are actually the successful passes. We will again filter the datasets by successful passes:
In [40]:
events_pn_Real = events_pn_Real[events_pn_Real['pass_outcome'].isnull() == True].reset_index()
events_pn_Liv = events_pn_Liv[events_pn_Liv['pass_outcome'].isnull() == True].reset_index()
  • The first 10 rows of the filtered datasets:
In [41]:
events_pn_Real.head(10)
Out[41]:
index minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
0 8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane Raphaël Varane Luka Modrić
1 9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić Luka Modrić Daniel Carvajal Ramos
2 10 0 11 Real Madrid Pass [22.3, 76.6] [33.4, 68.0] NaN Daniel Carvajal Ramos Daniel Carvajal Ramos Carlos Henrique Casimiro
3 17 0 40 Real Madrid Pass [57.5, 4.6] [49.2, 15.6] NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Carlos Henrique Casimiro
4 18 0 43 Real Madrid Pass [48.8, 18.4] [49.8, 12.5] NaN Carlos Henrique Casimiro Carlos Henrique Casimiro Toni Kroos
5 19 0 46 Real Madrid Pass [48.8, 13.9] [36.1, 56.3] NaN Toni Kroos Toni Kroos Raphaël Varane
6 20 0 52 Real Madrid Pass [41.3, 54.8] [34.4, 40.2] NaN Raphaël Varane Raphaël Varane Sergio Ramos García
7 21 0 55 Real Madrid Pass [39.1, 36.5] [65.4, 13.1] NaN Sergio Ramos García Sergio Ramos García Cristiano Ronaldo dos Santos Aveiro
8 22 0 58 Real Madrid Pass [64.5, 11.1] [54.2, 5.6] NaN Cristiano Ronaldo dos Santos Aveiro Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior
9 23 0 59 Real Madrid Pass [55.3, 5.5] [83.9, 4.3] NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Karim Benzema
In [42]:
events_pn_Liv.head(10)
Out[42]:
index minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
0 6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner James Philip Milner Dejan Lovren
1 12 0 16 Liverpool Pass [76.5, 18.1] [84.8, 9.5] NaN Jordan Brian Henderson Jordan Brian Henderson Sadio Mané
2 13 0 18 Liverpool Pass [84.4, 10.0] [92.5, 19.1] NaN Sadio Mané Sadio Mané Roberto Firmino Barbosa de Oliveira
3 14 0 19 Liverpool Pass [91.6, 21.3] [90.6, 50.7] NaN Roberto Firmino Barbosa de Oliveira Roberto Firmino Barbosa de Oliveira Mohamed Salah
4 37 2 0 Liverpool Pass [9.9, 39.1] [28.1, 4.2] NaN Virgil van Dijk Virgil van Dijk Andrew Robertson
5 39 2 7 Liverpool Pass [53.2, 0.1] [50.0, 4.0] NaN Andrew Robertson Andrew Robertson James Philip Milner
6 40 2 10 Liverpool Pass [45.5, 4.0] [27.4, 16.8] NaN James Philip Milner James Philip Milner Virgil van Dijk
7 41 2 13 Liverpool Pass [26.7, 19.6] [27.8, 47.3] NaN Virgil van Dijk Virgil van Dijk Dejan Lovren
8 42 2 16 Liverpool Pass [28.0, 45.4] [28.4, 21.4] NaN Dejan Lovren Dejan Lovren Virgil van Dijk
9 43 2 19 Liverpool Pass [30.4, 25.7] [30.7, 52.9] NaN Virgil van Dijk Virgil van Dijk Dejan Lovren
  • So it seems we have been able to logically clean and modify the datasets. Now we are only focused on building the pass network among the players who were in the starting 11 from both the teams. So we will discard out the rows which consist of pass events that took place after the first substitution for either of the teams. Let us find the minute and second of the first substitution for both Real Madrid and Liverpool.

  • Now, let us filter the datasets events_Real and events_Liv by setting the type to be Substitution. This will give us the information of when the first substitution had taken place for the teams.

In [43]:
substitution_Real = events_Real[events_Real['type'] == 'Substitution']
substitution_Liv = events_Liv[events_Liv['type'] == 'Substitution']
  • And let us view the datasets:
In [44]:
substitution_Real
Out[44]:
minute second team type location pass_end_location pass_outcome player
3485 36 17 Real Madrid Substitution NaN NaN NaN Daniel Carvajal Ramos
3486 60 56 Real Madrid Substitution NaN NaN NaN Francisco Román Alarcón Suárez
3488 88 21 Real Madrid Substitution NaN NaN NaN Karim Benzema
In [45]:
substitution_Liv
Out[45]:
minute second team type location pass_end_location pass_outcome player
3484 29 39 Liverpool Substitution NaN NaN NaN Mohamed Salah
3487 82 27 Liverpool Substitution NaN NaN NaN James Philip Milner
  • We see that the first substitution takes place for Real Madrid at the 36th minute and 17th second, whereas for Liverpool it takes place around 29th minute and 39th second. Let us find these out by writing a small Python code:
In [46]:
substitution_Real_minute = np.min(substitution_Real['minute'])
substitution_Real_minute_data = substitution_Real[substitution_Real['minute'] == substitution_Real_minute]
substitution_Real_second = np.min(substitution_Real_minute_data['second'])
print("minute =", substitution_Real_minute, "second =",  substitution_Real_second)
minute = 36 second = 17
In [47]:
substitution_Liv_minute = np.min(substitution_Liv['minute'])
substitution_Liv_minute_data = substitution_Liv[substitution_Liv['minute'] == substitution_Liv_minute]
substitution_Liv_second = np.min(substitution_Liv_minute_data['second'])
print("minute = ", substitution_Liv_minute, "second = ", substitution_Liv_second)
minute =  29 second =  39
  • We see that we have gotten the correct timings of when the first substitutions had taken place. Now we filter our datasets by taking tose pass events that took place before the first substitutions
In [48]:
events_pn_Real = events_pn_Real[(events_pn_Real['minute'] <= substitution_Real_minute)]

events_pn_Liv = events_pn_Liv[(events_pn_Liv['minute'] <= substitution_Liv_minute)]
  • Let us again print the first 10 rows of the renewed datasets:
In [49]:
events_pn_Real.head(10)
Out[49]:
index minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
0 8 0 8 Real Madrid Pass [27.4, 60.2] [36.1, 71.6] NaN Raphaël Varane Raphaël Varane Luka Modrić
1 9 0 10 Real Madrid Pass [35.3, 75.4] [22.4, 76.6] NaN Luka Modrić Luka Modrić Daniel Carvajal Ramos
2 10 0 11 Real Madrid Pass [22.3, 76.6] [33.4, 68.0] NaN Daniel Carvajal Ramos Daniel Carvajal Ramos Carlos Henrique Casimiro
3 17 0 40 Real Madrid Pass [57.5, 4.6] [49.2, 15.6] NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Carlos Henrique Casimiro
4 18 0 43 Real Madrid Pass [48.8, 18.4] [49.8, 12.5] NaN Carlos Henrique Casimiro Carlos Henrique Casimiro Toni Kroos
5 19 0 46 Real Madrid Pass [48.8, 13.9] [36.1, 56.3] NaN Toni Kroos Toni Kroos Raphaël Varane
6 20 0 52 Real Madrid Pass [41.3, 54.8] [34.4, 40.2] NaN Raphaël Varane Raphaël Varane Sergio Ramos García
7 21 0 55 Real Madrid Pass [39.1, 36.5] [65.4, 13.1] NaN Sergio Ramos García Sergio Ramos García Cristiano Ronaldo dos Santos Aveiro
8 22 0 58 Real Madrid Pass [64.5, 11.1] [54.2, 5.6] NaN Cristiano Ronaldo dos Santos Aveiro Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior
9 23 0 59 Real Madrid Pass [55.3, 5.5] [83.9, 4.3] NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Karim Benzema
In [50]:
events_pn_Liv.head(10)
Out[50]:
index minute second team type location pass_end_location pass_outcome player pass_maker pass_receiver
0 6 0 0 Liverpool Pass [60.0, 40.0] [32.1, 41.2] NaN James Philip Milner James Philip Milner Dejan Lovren
1 12 0 16 Liverpool Pass [76.5, 18.1] [84.8, 9.5] NaN Jordan Brian Henderson Jordan Brian Henderson Sadio Mané
2 13 0 18 Liverpool Pass [84.4, 10.0] [92.5, 19.1] NaN Sadio Mané Sadio Mané Roberto Firmino Barbosa de Oliveira
3 14 0 19 Liverpool Pass [91.6, 21.3] [90.6, 50.7] NaN Roberto Firmino Barbosa de Oliveira Roberto Firmino Barbosa de Oliveira Mohamed Salah
4 37 2 0 Liverpool Pass [9.9, 39.1] [28.1, 4.2] NaN Virgil van Dijk Virgil van Dijk Andrew Robertson
5 39 2 7 Liverpool Pass [53.2, 0.1] [50.0, 4.0] NaN Andrew Robertson Andrew Robertson James Philip Milner
6 40 2 10 Liverpool Pass [45.5, 4.0] [27.4, 16.8] NaN James Philip Milner James Philip Milner Virgil van Dijk
7 41 2 13 Liverpool Pass [26.7, 19.6] [27.8, 47.3] NaN Virgil van Dijk Virgil van Dijk Dejan Lovren
8 42 2 16 Liverpool Pass [28.0, 45.4] [28.4, 21.4] NaN Dejan Lovren Dejan Lovren Virgil van Dijk
9 43 2 19 Liverpool Pass [30.4, 25.7] [30.7, 52.9] NaN Virgil van Dijk Virgil van Dijk Dejan Lovren
  • Now from the datasets, we will split the location and the pass_end_location columns into two columns each representing the coordinates and name them as pass_maker_x, pass_maker_y, pass_receiver_x and pass_receiver_y.

  • Let us manipulate the dataset for Real Madrid first:

In [51]:
Loc = events_pn_Real['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])

Loc_end = events_pn_Real['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])

events_pn_Real['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Real['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Real['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Real['pass_receiver_y'] = Loc_end['pass_receiver_y']

events_pn_Real = events_pn_Real[['index', 'minute', 'second', 'team', 'type', 'pass_outcome', 
                                 'player', 'pass_maker', 'pass_receiver', 'pass_maker_x', 
                                 'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
In [52]:
events_pn_Real.head(10)
Out[52]:
index minute second team type pass_outcome player pass_maker pass_receiver pass_maker_x pass_maker_y pass_receiver_x pass_receiver_y
0 8 0 8 Real Madrid Pass NaN Raphaël Varane Raphaël Varane Luka Modrić 27.4 60.2 36.1 71.6
1 9 0 10 Real Madrid Pass NaN Luka Modrić Luka Modrić Daniel Carvajal Ramos 35.3 75.4 22.4 76.6
2 10 0 11 Real Madrid Pass NaN Daniel Carvajal Ramos Daniel Carvajal Ramos Carlos Henrique Casimiro 22.3 76.6 33.4 68.0
3 17 0 40 Real Madrid Pass NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Carlos Henrique Casimiro 57.5 4.6 49.2 15.6
4 18 0 43 Real Madrid Pass NaN Carlos Henrique Casimiro Carlos Henrique Casimiro Toni Kroos 48.8 18.4 49.8 12.5
5 19 0 46 Real Madrid Pass NaN Toni Kroos Toni Kroos Raphaël Varane 48.8 13.9 36.1 56.3
6 20 0 52 Real Madrid Pass NaN Raphaël Varane Raphaël Varane Sergio Ramos García 41.3 54.8 34.4 40.2
7 21 0 55 Real Madrid Pass NaN Sergio Ramos García Sergio Ramos García Cristiano Ronaldo dos Santos Aveiro 39.1 36.5 65.4 13.1
8 22 0 58 Real Madrid Pass NaN Cristiano Ronaldo dos Santos Aveiro Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior 64.5 11.1 54.2 5.6
9 23 0 59 Real Madrid Pass NaN Marcelo Vieira da Silva Júnior Marcelo Vieira da Silva Júnior Karim Benzema 55.3 5.5 83.9 4.3
  • Same data manipulation for Liverpool:
In [53]:
Loc = events_pn_Liv['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])

Loc_end = events_pn_Liv['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])

events_pn_Liv['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Liv['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Liv['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Liv['pass_receiver_y'] = Loc_end['pass_receiver_y']

events_pn_Liv = events_pn_Liv[['index', 'minute', 'second', 'team', 'type', 'pass_outcome', 
                               'player', 'pass_maker', 'pass_receiver', 'pass_maker_x', 
                               'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
In [54]:
events_pn_Liv.head(10)
Out[54]:
index minute second team type pass_outcome player pass_maker pass_receiver pass_maker_x pass_maker_y pass_receiver_x pass_receiver_y
0 6 0 0 Liverpool Pass NaN James Philip Milner James Philip Milner Dejan Lovren 60.0 40.0 32.1 41.2
1 12 0 16 Liverpool Pass NaN Jordan Brian Henderson Jordan Brian Henderson Sadio Mané 76.5 18.1 84.8 9.5
2 13 0 18 Liverpool Pass NaN Sadio Mané Sadio Mané Roberto Firmino Barbosa de Oliveira 84.4 10.0 92.5 19.1
3 14 0 19 Liverpool Pass NaN Roberto Firmino Barbosa de Oliveira Roberto Firmino Barbosa de Oliveira Mohamed Salah 91.6 21.3 90.6 50.7
4 37 2 0 Liverpool Pass NaN Virgil van Dijk Virgil van Dijk Andrew Robertson 9.9 39.1 28.1 4.2
5 39 2 7 Liverpool Pass NaN Andrew Robertson Andrew Robertson James Philip Milner 53.2 0.1 50.0 4.0
6 40 2 10 Liverpool Pass NaN James Philip Milner James Philip Milner Virgil van Dijk 45.5 4.0 27.4 16.8
7 41 2 13 Liverpool Pass NaN Virgil van Dijk Virgil van Dijk Dejan Lovren 26.7 19.6 27.8 47.3
8 42 2 16 Liverpool Pass NaN Dejan Lovren Dejan Lovren Virgil van Dijk 28.0 45.4 28.4 21.4
9 43 2 19 Liverpool Pass NaN Virgil van Dijk Virgil van Dijk Dejan Lovren 30.4 25.7 30.7 52.9
  • Inspired by the way given here, we will take the average locations of the starting 11 players on the field for a unified construction of the pass network, and also will count the number of passes created by these player:
In [55]:
av_loc_Real = events_pn_Real.groupby('pass_maker').agg({'pass_maker_x':['mean'], 
                                                        'pass_maker_y':['mean', 'count']})
In [56]:
av_loc_Real
Out[56]:
pass_maker_x pass_maker_y
mean mean count
pass_maker
Carlos Henrique Casimiro 60.845455 31.836364 11
Cristiano Ronaldo dos Santos Aveiro 81.580000 29.160000 10
Daniel Carvajal Ramos 64.341667 73.875000 24
Francisco Román Alarcón Suárez 62.323529 27.082353 17
Karim Benzema 65.081818 27.936364 11
Keylor Navas Gamboa 10.870000 41.810000 10
Luka Modrić 60.604762 55.028571 21
Marcelo Vieira da Silva Júnior 59.865217 11.130435 23
Raphaël Varane 37.436364 58.354545 22
Sergio Ramos García 41.282353 24.514706 34
Toni Kroos 51.190000 24.275000 40
  • As we see the groupby() function from pandas splits events_pn_Real into groups indexed by the player names. Whereas, the agg() function aggregates the data into the averages of the pass makers' locations and also counts the number of passes made by these players. Now refine the column names of av_loc_Real:
In [57]:
av_loc_Real.columns = ['pass_maker_x', 'pass_maker_y', 'count']
In [58]:
av_loc_Real
Out[58]:
pass_maker_x pass_maker_y count
pass_maker
Carlos Henrique Casimiro 60.845455 31.836364 11
Cristiano Ronaldo dos Santos Aveiro 81.580000 29.160000 10
Daniel Carvajal Ramos 64.341667 73.875000 24
Francisco Román Alarcón Suárez 62.323529 27.082353 17
Karim Benzema 65.081818 27.936364 11
Keylor Navas Gamboa 10.870000 41.810000 10
Luka Modrić 60.604762 55.028571 21
Marcelo Vieira da Silva Júnior 59.865217 11.130435 23
Raphaël Varane 37.436364 58.354545 22
Sergio Ramos García 41.282353 24.514706 34
Toni Kroos 51.190000 24.275000 40
  • Now perform the same operations for Liverpool:
In [59]:
av_loc_Liv = events_pn_Liv.groupby('pass_maker').agg({'pass_maker_x':['mean'], 
                                                      'pass_maker_y':['mean', 'count']})
av_loc_Liv.columns = ['pass_maker_x', 'pass_maker_y', 'count']
In [60]:
av_loc_Liv
Out[60]:
pass_maker_x pass_maker_y count
pass_maker
Andrew Robertson 59.815385 6.830769 13
Dejan Lovren 41.690909 60.172727 11
Georginio Wijnaldum 76.390909 28.518182 11
James Philip Milner 72.353333 36.153333 15
Jordan Brian Henderson 61.035294 37.152941 17
Loris Karius 12.914286 40.385714 7
Mohamed Salah 77.550000 64.710000 10
Roberto Firmino Barbosa de Oliveira 78.250000 43.570000 10
Sadio Mané 86.275000 22.075000 4
Trent Alexander-Arnold 64.666667 72.550000 12
Virgil van Dijk 43.366667 25.433333 9
  • Once we sort out the starting 11 pass makers' average locations in a game, we will try to figure out the number of times a particular pass maker passed the ball to a particular pass receiver (be cautious to keep the direction of pass in mind, i.e, a pass from a player A to a player B is not identical to a pass from player B to player A). We will use the groupby() and the count() function to count the number of rows where a unique player A passed the ball to another unique player B.
In [61]:
pass_Real = events_pn_Real.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
In [62]:
pass_Real.head(10)
Out[62]:
pass_maker pass_receiver index
0 Carlos Henrique Casimiro Daniel Carvajal Ramos 1
1 Carlos Henrique Casimiro Luka Modrić 1
2 Carlos Henrique Casimiro Marcelo Vieira da Silva Júnior 1
3 Carlos Henrique Casimiro Raphaël Varane 1
4 Carlos Henrique Casimiro Sergio Ramos García 1
5 Carlos Henrique Casimiro Toni Kroos 6
6 Cristiano Ronaldo dos Santos Aveiro Daniel Carvajal Ramos 3
7 Cristiano Ronaldo dos Santos Aveiro Karim Benzema 1
8 Cristiano Ronaldo dos Santos Aveiro Luka Modrić 1
9 Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior 4
In [63]:
pass_Liv = events_pn_Liv.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
In [64]:
pass_Liv.head(10)
Out[64]:
pass_maker pass_receiver index
0 Andrew Robertson Andrew Robertson 1
1 Andrew Robertson Georginio Wijnaldum 3
2 Andrew Robertson James Philip Milner 3
3 Andrew Robertson Jordan Brian Henderson 2
4 Andrew Robertson Roberto Firmino Barbosa de Oliveira 2
5 Andrew Robertson Virgil van Dijk 2
6 Dejan Lovren James Philip Milner 1
7 Dejan Lovren Jordan Brian Henderson 1
8 Dejan Lovren Loris Karius 2
9 Dejan Lovren Mohamed Salah 1
  • Let's rename the index column to number_of_passes:
In [65]:
pass_Real.rename(columns = {'index':'number_of_passes'}, inplace = True)
In [66]:
pass_Real.head(10)
Out[66]:
pass_maker pass_receiver number_of_passes
0 Carlos Henrique Casimiro Daniel Carvajal Ramos 1
1 Carlos Henrique Casimiro Luka Modrić 1
2 Carlos Henrique Casimiro Marcelo Vieira da Silva Júnior 1
3 Carlos Henrique Casimiro Raphaël Varane 1
4 Carlos Henrique Casimiro Sergio Ramos García 1
5 Carlos Henrique Casimiro Toni Kroos 6
6 Cristiano Ronaldo dos Santos Aveiro Daniel Carvajal Ramos 3
7 Cristiano Ronaldo dos Santos Aveiro Karim Benzema 1
8 Cristiano Ronaldo dos Santos Aveiro Luka Modrić 1
9 Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior 4
In [67]:
pass_Liv.rename(columns = {'index':'number_of_passes'}, inplace = True)
In [68]:
pass_Liv.head(10)
Out[68]:
pass_maker pass_receiver number_of_passes
0 Andrew Robertson Andrew Robertson 1
1 Andrew Robertson Georginio Wijnaldum 3
2 Andrew Robertson James Philip Milner 3
3 Andrew Robertson Jordan Brian Henderson 2
4 Andrew Robertson Roberto Firmino Barbosa de Oliveira 2
5 Andrew Robertson Virgil van Dijk 2
6 Dejan Lovren James Philip Milner 1
7 Dejan Lovren Jordan Brian Henderson 1
8 Dejan Lovren Loris Karius 2
9 Dejan Lovren Mohamed Salah 1
  • Now, we will merge the datasets av_loc_Real and pass_Real, Let us identify the left and the right dataframes for performing the merge. Here, av_loc_Real is the left dataframe and pass_Real is the right. We will use the merge() function from pandas to carry out the merging operation.
In [69]:
pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_maker', right_index = True)
In [70]:
pass_Real.head(10)
Out[70]:
pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count
0 Carlos Henrique Casimiro Daniel Carvajal Ramos 1 60.845455 31.836364 11
1 Carlos Henrique Casimiro Luka Modrić 1 60.845455 31.836364 11
2 Carlos Henrique Casimiro Marcelo Vieira da Silva Júnior 1 60.845455 31.836364 11
3 Carlos Henrique Casimiro Raphaël Varane 1 60.845455 31.836364 11
4 Carlos Henrique Casimiro Sergio Ramos García 1 60.845455 31.836364 11
5 Carlos Henrique Casimiro Toni Kroos 6 60.845455 31.836364 11
6 Cristiano Ronaldo dos Santos Aveiro Daniel Carvajal Ramos 3 81.580000 29.160000 10
7 Cristiano Ronaldo dos Santos Aveiro Karim Benzema 1 81.580000 29.160000 10
8 Cristiano Ronaldo dos Santos Aveiro Luka Modrić 1 81.580000 29.160000 10
9 Cristiano Ronaldo dos Santos Aveiro Marcelo Vieira da Silva Júnior 4 81.580000 29.160000 10

The left_on argument specifies the column names to join our right dataframe on, and the right_index argument decides whether to use the index from the right dataframe as the key for joining. Let us do the same operation for the other team:

In [71]:
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_maker', right_index = True)
In [72]:
pass_Liv.head(10)
Out[72]:
pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count
0 Andrew Robertson Andrew Robertson 1 59.815385 6.830769 13
1 Andrew Robertson Georginio Wijnaldum 3 59.815385 6.830769 13
2 Andrew Robertson James Philip Milner 3 59.815385 6.830769 13
3 Andrew Robertson Jordan Brian Henderson 2 59.815385 6.830769 13
4 Andrew Robertson Roberto Firmino Barbosa de Oliveira 2 59.815385 6.830769 13
5 Andrew Robertson Virgil van Dijk 2 59.815385 6.830769 13
6 Dejan Lovren James Philip Milner 1 41.690909 60.172727 11
7 Dejan Lovren Jordan Brian Henderson 1 41.690909 60.172727 11
8 Dejan Lovren Loris Karius 2 41.690909 60.172727 11
9 Dejan Lovren Mohamed Salah 1 41.690909 60.172727 11
  • Finally, we will again perform a merge on these updated datasets for adding the average locations of the pass receivers and the number of times the receiver received the ball. A last touch of data cleaning will fetch us the dataset sufficient to start visualizing the pass networks for both the teams
In [73]:
pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_receiver', 
                            right_index = True, suffixes = ['', '_receipt'])
pass_Real.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x', 
                            'pass_maker_y_receipt':'pass_receiver_y', 
                            'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Real = pass_Real[pass_Real['pass_maker'] != pass_Real['pass_receiver']].reset_index()
In [74]:
pass_Real
Out[74]:
index pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count pass_receiver_x pass_receiver_y number_of_passes_received
0 0 Carlos Henrique Casimiro Daniel Carvajal Ramos 1 60.845455 31.836364 11 64.341667 73.875 24
1 6 Cristiano Ronaldo dos Santos Aveiro Daniel Carvajal Ramos 3 81.580000 29.160000 10 64.341667 73.875 24
2 21 Francisco Román Alarcón Suárez Daniel Carvajal Ramos 2 62.323529 27.082353 17 64.341667 73.875 24
3 29 Karim Benzema Daniel Carvajal Ramos 2 65.081818 27.936364 11 64.341667 73.875 24
4 39 Luka Modrić Daniel Carvajal Ramos 10 60.604762 55.028571 21 64.341667 73.875 24
... ... ... ... ... ... ... ... ... ... ...
73 16 Daniel Carvajal Ramos Keylor Navas Gamboa 1 64.341667 73.875000 24 10.870000 41.810 10
74 30 Karim Benzema Keylor Navas Gamboa 1 65.081818 27.936364 11 10.870000 41.810 10
75 57 Raphaël Varane Keylor Navas Gamboa 2 37.436364 58.354545 22 10.870000 41.810 10
76 64 Sergio Ramos García Keylor Navas Gamboa 1 41.282353 24.514706 34 10.870000 41.810 10
77 74 Toni Kroos Keylor Navas Gamboa 1 51.190000 24.275000 40 10.870000 41.810 10

78 rows × 10 columns

In [75]:
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_receiver', 
                          right_index = True, suffixes = ['', '_receipt'])
pass_Liv.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x', 
                           'pass_maker_y_receipt':'pass_receiver_y', 
                           'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Liv = pass_Liv[pass_Liv['pass_maker'] != pass_Liv['pass_receiver']].reset_index()
In [76]:
pass_Liv
Out[76]:
index pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count pass_receiver_x pass_receiver_y number_of_passes_received
0 12 Georginio Wijnaldum Andrew Robertson 4 76.390909 28.518182 11 59.815385 6.830769 13
1 18 James Philip Milner Andrew Robertson 1 72.353333 36.153333 15 59.815385 6.830769 13
2 28 Jordan Brian Henderson Andrew Robertson 1 61.035294 37.152941 17 59.815385 6.830769 13
3 36 Loris Karius Andrew Robertson 1 12.914286 40.385714 7 59.815385 6.830769 13
4 54 Trent Alexander-Arnold Andrew Robertson 1 64.666667 72.550000 12 59.815385 6.830769 13
... ... ... ... ... ... ... ... ... ... ...
59 55 Trent Alexander-Arnold Dejan Lovren 1 64.666667 72.550000 12 41.690909 60.172727 11
60 61 Virgil van Dijk Dejan Lovren 3 43.366667 25.433333 9 41.690909 60.172727 11
61 25 James Philip Milner Sadio Mané 2 72.353333 36.153333 15 86.275000 22.075000 4
62 33 Jordan Brian Henderson Sadio Mané 1 61.035294 37.152941 17 86.275000 22.075000 4
63 43 Mohamed Salah Sadio Mané 1 77.550000 64.710000 10 86.275000 22.075000 4

64 rows × 10 columns

  • We will replace the player names with their jersey numbers and create another pair of new datasets:
In [77]:
pass_Real_new = pass_Real.replace({"pass_maker": players_Real, "pass_receiver": players_Real})
In [78]:
pass_Real_new
Out[78]:
index pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count pass_receiver_x pass_receiver_y number_of_passes_received
0 0 14 2 1 60.845455 31.836364 11 64.341667 73.875 24
1 6 7 2 3 81.580000 29.160000 10 64.341667 73.875 24
2 21 22 2 2 62.323529 27.082353 17 64.341667 73.875 24
3 29 9 2 2 65.081818 27.936364 11 64.341667 73.875 24
4 39 10 2 10 60.604762 55.028571 21 64.341667 73.875 24
... ... ... ... ... ... ... ... ... ... ...
73 16 2 1 1 64.341667 73.875000 24 10.870000 41.810 10
74 30 9 1 1 65.081818 27.936364 11 10.870000 41.810 10
75 57 5 1 2 37.436364 58.354545 22 10.870000 41.810 10
76 64 4 1 1 41.282353 24.514706 34 10.870000 41.810 10
77 74 8 1 1 51.190000 24.275000 40 10.870000 41.810 10

78 rows × 10 columns

In [79]:
pass_Liv_new = pass_Liv.replace({"pass_maker": players_Liv, "pass_receiver": players_Liv})
In [80]:
pass_Liv_new
Out[80]:
index pass_maker pass_receiver number_of_passes pass_maker_x pass_maker_y count pass_receiver_x pass_receiver_y number_of_passes_received
0 12 5 26 4 76.390909 28.518182 11 59.815385 6.830769 13
1 18 7 26 1 72.353333 36.153333 15 59.815385 6.830769 13
2 28 14 26 1 61.035294 37.152941 17 59.815385 6.830769 13
3 36 1 26 1 12.914286 40.385714 7 59.815385 6.830769 13
4 54 66 26 1 64.666667 72.550000 12 59.815385 6.830769 13
... ... ... ... ... ... ... ... ... ... ...
59 55 66 6 1 64.666667 72.550000 12 41.690909 60.172727 11
60 61 4 6 3 43.366667 25.433333 9 41.690909 60.172727 11
61 25 7 19 2 72.353333 36.153333 15 86.275000 22.075000 4
62 33 14 19 1 61.035294 37.152941 17 86.275000 22.075000 4
63 43 11 19 1 77.550000 64.710000 10 86.275000 22.075000 4

64 rows × 10 columns

  • Now let us visualize the pass networks for both the teams.
In [81]:
pitch = Pitch(pitch_color='grass', goal_type = 'box', line_color='white', stripe = True, 
              constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(pass_Real.pass_maker_x, pass_Real.pass_maker_y,
                         pass_Real.pass_receiver_x, pass_Real.pass_receiver_y, lw = 5,
                         color = 'black', zorder = 1, ax=ax)
nodes = pitch.scatter(av_loc_Real.pass_maker_x, av_loc_Real.pass_maker_y,
                           s=350, color = 'white', edgecolors='black', linewidth=1, alpha = 1, ax = ax)
                          
for index, row in av_loc_Real.iterrows():
    pitch.annotate(players_Real[row.name], xy=(row.pass_maker_x, row.pass_maker_y),
                   c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Real Madrid against Liverpool", size = 20)                   
plt.show()
In [82]:
pitch = Pitch(pitch_color='grass', goal_type = 'box', stripe = True, 
              line_color='white', constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(120 - pass_Liv.pass_maker_x, pass_Liv.pass_maker_y,
                         120 - pass_Liv.pass_receiver_x, pass_Liv.pass_receiver_y, lw = 5,
                         color = 'black', zorder = 1, ax = ax)
nodes = pitch.scatter(120 - av_loc_Liv.pass_maker_x, av_loc_Liv.pass_maker_y,
                           s=350, color = 'red', edgecolors = 'black', linewidth=1, alpha = 1, ax = ax)
                           
for index, row in av_loc_Liv.iterrows():
    pitch.annotate(players_Liv[row.name], xy=(120 - row.pass_maker_x, row.pass_maker_y), 
                   c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Liverpool against Real Madrid", size = 20)
plt.show()
  • In case of Liverpool's pass network visualization, we subtract the x coordinates from 120 just to reverse the x-axis.
  • Now that we have been successful in correctly visualizing the pass networks of the teams involved in the game, we will now start analyzing our networks using metrics from the literature of complex network analysis.

  • Note that both of our networks are directed weighted graphs, with number of passes as the weight for a directed edge.

  • Let us first develop the isomorphic graph to the one we just visualized for Real Madrid, but this time using the networkx package. First we will use the relevant columns from the pass_Real_new dataset:

In [83]:
pass_Real_new = pass_Real_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Real_new
Out[83]:
pass_maker pass_receiver number_of_passes
0 14 2 1
1 7 2 3
2 22 2 2
3 9 2 2
4 10 2 10
... ... ... ...
73 2 1 1
74 9 1 1
75 5 1 2
76 4 1 1
77 8 1 1

78 rows × 3 columns

  • We will next convert pass_Real_new to a list of tuples, where each row is converted to a tuple. This is required for drawing a networkx graph.
In [84]:
L_Real = pass_Real_new.apply(tuple, axis=1).tolist()
print(L_Real)
[('14', '2', 1), ('7', '2', 3), ('22', '2', 2), ('9', '2', 2), ('10', '2', 10), ('12', '2', 2), ('5', '2', 3), ('4', '2', 3), ('8', '2', 1), ('14', '10', 1), ('7', '10', 1), ('2', '10', 7), ('22', '10', 1), ('12', '10', 1), ('5', '10', 5), ('4', '10', 2), ('8', '10', 5), ('14', '12', 1), ('7', '12', 4), ('22', '12', 2), ('1', '12', 2), ('10', '12', 1), ('4', '12', 9), ('8', '12', 4), ('14', '5', 1), ('2', '5', 5), ('1', '5', 2), ('10', '5', 3), ('12', '5', 2), ('4', '5', 5), ('8', '5', 4), ('14', '4', 1), ('7', '4', 1), ('22', '4', 5), ('9', '4', 1), ('1', '4', 4), ('10', '4', 1), ('12', '4', 2), ('5', '4', 6), ('8', '4', 10), ('14', '8', 6), ('2', '8', 1), ('22', '8', 4), ('9', '8', 4), ('1', '8', 1), ('10', '8', 4), ('12', '8', 5), ('5', '8', 4), ('4', '8', 9), ('7', '9', 1), ('2', '9', 1), ('22', '9', 1), ('1', '9', 1), ('10', '9', 1), ('12', '9', 3), ('5', '9', 1), ('8', '9', 2), ('2', '14', 2), ('9', '14', 2), ('10', '14', 1), ('12', '14', 2), ('5', '14', 1), ('8', '14', 2), ('2', '7', 2), ('22', '7', 2), ('9', '7', 1), ('12', '7', 2), ('4', '7', 1), ('8', '7', 2), ('2', '22', 3), ('12', '22', 4), ('4', '22', 4), ('8', '22', 8), ('2', '1', 1), ('9', '1', 1), ('5', '1', 2), ('4', '1', 1), ('8', '1', 1)]
  • Now, we can draw the directed weighted graph:
In [85]:
G_Real = nx.DiGraph()

for i in range(len(L_Real)):
    G_Real.add_edge(L_Real[i][0], L_Real[i][1], weight = L_Real[i][2])

edges_Real = G_Real.edges()
weights_Real = [G_Real[u][v]['weight'] for u, v in edges_Real]

nx.draw(G_Real, node_size=800, with_labels=True, node_color='white', width = weights_Real)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.title("Pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
  • Now for Liverpool too, let us first clean the pass_Liv_new dataset and then draw the isomorphic weighted directed graph:
In [86]:
pass_Liv_new = pass_Liv_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
In [87]:
pass_Liv_new
Out[87]:
pass_maker pass_receiver number_of_passes
0 5 26 4
1 7 26 1
2 14 26 1
3 1 26 1
4 66 26 1
... ... ... ...
59 66 6 1
60 4 6 3
61 7 19 2
62 14 19 1
63 11 19 1

64 rows × 3 columns

In [88]:
L_Liv = pass_Liv_new.apply(tuple, axis=1).tolist()
G_Liv = nx.DiGraph()

for i in range(len(L_Liv)):
    G_Liv.add_edge(L_Liv[i][0], L_Liv[i][1], weight = L_Liv[i][2])

edges_Liv = G_Liv.edges()
weights_Liv = [G_Liv[u][v]['weight'] for u, v in edges_Liv]

nx.draw(G_Liv, node_size = 800, with_labels = True, node_color = 'red', width = weights_Liv)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.show()
  • Let us discuss some of the important functions from the networkx package that we have employed for drawing graphs:

    • DiGraph() function sets the base class for generating directed graphs,
    • add_edge() function adds an edge between two nodes given by the first two arguments and the weight parameter sets the weight for this edge
    • draw() function visualizes a networkx graph and its parameters are self-explanatory
  • Let us now understand the degree, indegree and outdegree of a node from a directed weighted graph. Indegree of a node is the total number of edges that are directed towards the node, i.e, for our case, the total number of passes received by a player (node). Similarly, outdegree means the total number of edges that are directed outwards from the node, i.e, the total number of passes given by a player. Finally, the degree of a node is the total number of edges connected to a node (ignoring the directions of the edges), i.e, sum of the total number of passes given and the total number of passes received by a player. It is evident that the degree of a node is the sum of its indegree and outdegree.

We will use networkx to find out the node degrees from the pass network of Real Madrid.

In [89]:
# Prepare a dictionary with jersey numbers as the node ids, 
# i.e, the dictionary keys and degrees as the dictionary values
deg_Real = dict(nx.degree(G_Real)) 
# convert a dictionary to a pandas dataframe
degree_Real = pd.DataFrame.from_dict(list(deg_Real.items())) 
degree_Real.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
In [90]:
degree_Real
Out[90]:
jersey_number node_degree
0 14 12
1 2 17
2 7 11
3 22 11
4 9 14
5 10 15
6 12 16
7 5 14
8 4 17
9 8 19
10 1 10
  • Out of the 11 starting players for Real Madrid in that game, we notice that the player with jersey number 8 (i.e, Toni Kroos) had the highest degree value of 19. On second are ranked the players with jersey number 2 and 4 with degree value 17, i.e, our favorite Spanish defenders 'Daniel Carvajal Ramos' and 'Sergio Ramos García' respectively. Tremendous! Let us use seaborn to visualize the deg_Real dictionary via histogram plot:
In [91]:
X = list(deg_Real.keys())
Y = list(deg_Real.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Real Madrid vs Liverpool", size = 16)
plt.show()
  • Let us build the dataframe for Liverpool too:
In [92]:
# Prepare a dictionary with jersey numbers as the node ids, 
# i.e, the dictionary keys and degrees as the dictionary values
deg_Liv = dict(nx.degree(G_Liv)) 
# convert a dictionary to a pandas dataframe
degree_Liv = pd.DataFrame.from_dict(list(deg_Liv.items()))
degree_Liv.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Liv
Out[92]:
jersey_number node_degree
0 5 12
1 26 11
2 7 17
3 14 17
4 1 7
5 66 13
6 4 12
7 11 11
8 6 12
9 9 10
10 19 6
  • We see that for Liverpool the degree value is highest (17) for players having jersey number 14 and 7, i,e 'Jordan Brian Henderson' and 'James Philip Milner' respectively. We will visualize the deg_Liv dictionary via histogram plot:
In [93]:
X = list(deg_Liv.keys())
Y = list(deg_Liv.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Liverpool vs Real Madrid", size = 16)
plt.show()
  • We will visualize similar histogram plots for the indegrees and the outdegrees too:
In [94]:
indeg_Real = dict(G_Real.in_degree()) 
indegree_Real = pd.DataFrame.from_dict(list(indeg_Real.items())) 
indegree_Real.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Real.keys())
Y = list(indeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
In [95]:
indeg_Liv = dict(G_Liv.in_degree()) 
indegree_Liv = pd.DataFrame.from_dict(list(indeg_Liv.items())) 
indegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Liv.keys())
Y = list(indeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
In [96]:
outdeg_Real = dict(G_Real.out_degree()) 
outdegree_Real = pd.DataFrame.from_dict(list(outdeg_Real.items())) 
outdegree_Real.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Real.keys())
Y = list(outdeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
In [97]:
outdeg_Liv = dict(G_Liv.out_degree()) 
outdegree_Liv = pd.DataFrame.from_dict(list(outdeg_Liv.items())) 
outdegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Liv.keys())
Y = list(outdeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
  • Now, let us generate the adjacency matrices fr both G_Real and G_Liv graphs:
In [98]:
A_Real = nx.adjacency_matrix(G_Real)
A_Liv = nx.adjacency_matrix(G_Liv)
A_Real = A_Real.todense()
A_Liv = A_Liv.todense()
In [99]:
sns.heatmap(A_Real, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Real Madrid's pass network")
plt.show()
In [100]:
sns.heatmap(A_Liv, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Liverpool's pass network")
plt.show()
  • If we look into the diagonal of the adjacency matrices, we notice that all the values in the diagonals are 0. This depicts that their isn't any self loops in any nodes, indicating a player cannot pass to themselves.
  • The next step is to calculate the degree correlation coefficient of a graph. More specifically, we will calculate Pearson's degree correlation coefficient value. A positive value of the metric shows an overall positive relationship between the degrees (number of successful passes) of two adjacent nodes (players). Whereas a negative value shows an overall negative relationship. If it is 0, there is no relationship. Also the metric lies in [-1, 1], indicating -1 as the prefect negative relationship and 1 as the perfect positive relationship.
In [101]:
r_Real = nx.degree_pearson_correlation_coefficient(G_Real, weight = 'weight')
r_Liv = nx.degree_pearson_correlation_coefficient(G_Liv, weight = 'weight')
print(r_Real, r_Liv)
-0.17983836432860179 -0.2412372196699064
  • Now we work on a metric that focuses on the geodesic distance between two player nodes in a graph. One way to implement this is to divide 1 by the 'weight' column in the pass network. Let us create a new graph for Real Madrid:
In [102]:
pass_Real_mod = pass_Real_new[['pass_maker', 'pass_receiver']]
pass_Real_mod['1/nop'] = 1/pass_Real_new['number_of_passes']
pass_Real_mod.head(5)
Out[102]:
pass_maker pass_receiver 1/nop
0 14 2 1.000000
1 7 2 0.333333
2 22 2 0.500000
3 9 2 0.500000
4 10 2 0.100000
In [103]:
L_Real_mod = pass_Real_mod.apply(tuple, axis=1).tolist()

G_Real_mod = nx.DiGraph()

for i in range(len(L_Real_mod)):
    G_Real_mod.add_edge(L_Real_mod[i][0], L_Real_mod[i][1], weight = L_Real_mod[i][2])

edges_Real_mod = G_Real_mod.edges()
weights_Real_mod = [G_Real_mod[u][v]['weight'] for u, v in edges_Real_mod]

nx.draw(G_Real_mod, node_size=800, with_labels=True, node_color='white', width = weights_Real_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Real Madrid vs Liverpool", size = 20)

plt.show()
  • We will perform the same operations to create a modified graph for Liverpool too:
In [104]:
pass_Liv_mod = pass_Liv_new[['pass_maker', 'pass_receiver']]
pass_Liv_mod['1/nop'] = 1/pass_Liv_new['number_of_passes']
pass_Liv_mod.head(5)
Out[104]:
pass_maker pass_receiver 1/nop
0 5 26 0.25
1 7 26 1.00
2 14 26 1.00
3 1 26 1.00
4 66 26 1.00
In [105]:
L_Liv_mod = pass_Liv_mod.apply(tuple, axis=1).tolist()

G_Liv_mod = nx.DiGraph()

for i in range(len(L_Liv_mod)):
    G_Liv_mod.add_edge(L_Liv_mod[i][0], L_Liv_mod[i][1], weight = L_Liv_mod[i][2])

edges_Liv_mod = G_Liv_mod.edges()
weights_Liv_mod = [G_Liv_mod[u][v]['weight'] for u, v in edges_Liv_mod]

nx.draw(G_Liv_mod, node_size=800, with_labels=True, node_color='red', width = weights_Liv_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Liverpool vs Real Madrid", size = 20)

plt.show()
  • Now using these modified graphs we can calculate the all pair shortest paths between the nodes (players) for both the teams. Let us compute first for Real Madrid:
In [106]:
dis_Real = nx.shortest_path(G_Real_mod, weight = 'weight')
print(dis_Real)
{'14': {'14': ['14'], '2': ['14', '8', '10', '2'], '10': ['14', '8', '10'], '12': ['14', '8', '4', '12'], '5': ['14', '8', '5'], '4': ['14', '8', '4'], '8': ['14', '8'], '9': ['14', '8', '9'], '7': ['14', '8', '7'], '22': ['14', '8', '22'], '1': ['14', '8', '5', '1']}, '2': {'2': ['2'], '10': ['2', '10'], '5': ['2', '5'], '8': ['2', '10', '8'], '9': ['2', '5', '4', '12', '9'], '14': ['2', '14'], '7': ['2', '7'], '22': ['2', '22'], '1': ['2', '5', '1'], '12': ['2', '5', '4', '12'], '4': ['2', '5', '4']}, '7': {'7': ['7'], '2': ['7', '2'], '10': ['7', '2', '10'], '12': ['7', '12'], '4': ['7', '12', '8', '4'], '9': ['7', '12', '9'], '5': ['7', '2', '5'], '8': ['7', '12', '8'], '14': ['7', '12', '14'], '22': ['7', '12', '22'], '1': ['7', '2', '5', '1']}, '22': {'22': ['22'], '2': ['22', '2'], '10': ['22', '8', '10'], '12': ['22', '4', '12'], '4': ['22', '4'], '8': ['22', '8'], '9': ['22', '4', '12', '9'], '7': ['22', '7'], '5': ['22', '4', '5'], '1': ['22', '4', '5', '1'], '14': ['22', '8', '14']}, '9': {'9': ['9'], '2': ['9', '2'], '4': ['9', '8', '4'], '8': ['9', '8'], '14': ['9', '14'], '7': ['9', '8', '7'], '1': ['9', '1'], '10': ['9', '8', '10'], '12': ['9', '8', '4', '12'], '5': ['9', '8', '5'], '22': ['9', '8', '22']}, '10': {'10': ['10'], '2': ['10', '2'], '12': ['10', '8', '4', '12'], '5': ['10', '2', '5'], '4': ['10', '8', '4'], '8': ['10', '8'], '9': ['10', '8', '9'], '14': ['10', '2', '14'], '7': ['10', '2', '7'], '22': ['10', '8', '22'], '1': ['10', '2', '5', '1']}, '12': {'12': ['12'], '2': ['12', '2'], '10': ['12', '8', '10'], '5': ['12', '8', '5'], '4': ['12', '8', '4'], '8': ['12', '8'], '9': ['12', '9'], '14': ['12', '14'], '7': ['12', '7'], '22': ['12', '22'], '1': ['12', '8', '5', '1']}, '5': {'5': ['5'], '2': ['5', '10', '2'], '10': ['5', '10'], '4': ['5', '4'], '8': ['5', '8'], '9': ['5', '4', '12', '9'], '14': ['5', '8', '14'], '1': ['5', '1'], '12': ['5', '4', '12'], '7': ['5', '8', '7'], '22': ['5', '8', '22']}, '4': {'4': ['4'], '2': ['4', '2'], '10': ['4', '8', '10'], '12': ['4', '12'], '5': ['4', '5'], '8': ['4', '8'], '7': ['4', '12', '7'], '22': ['4', '8', '22'], '1': ['4', '5', '1'], '9': ['4', '12', '9'], '14': ['4', '12', '14']}, '8': {'8': ['8'], '2': ['8', '10', '2'], '10': ['8', '10'], '12': ['8', '4', '12'], '5': ['8', '5'], '4': ['8', '4'], '9': ['8', '9'], '14': ['8', '14'], '7': ['8', '7'], '22': ['8', '22'], '1': ['8', '5', '1']}, '1': {'1': ['1'], '12': ['1', '4', '12'], '5': ['1', '4', '5'], '4': ['1', '4'], '8': ['1', '4', '8'], '9': ['1', '4', '12', '9'], '2': ['1', '4', '2'], '10': ['1', '4', '8', '10'], '7': ['1', '4', '12', '7'], '22': ['1', '4', '8', '22'], '14': ['1', '4', '12', '14']}}
  • Suppose we want to calculate the shortest path from 'Keylor Navas Gamboa' (jersey number 1) to 'Cristiano Ronaldo dos Santos Aveiro' (jersey number 7). We will type the following:
In [107]:
print(dis_Real['1']['7'])
['1', '4', '12', '7']
  • So, we see that the fastest way possible to pass the ball from 'Keylor Navas Gamboa' (jersey: 1), to 'Cristiano Ronaldo dos Santos Aveiro' (jersey: 7) was to pass the ball first to 'Sergio Ramos García' (jersey: 4) who would pass to 'Marcelo Vieira da Silva Júnior' (jersey: 12) with him ultimately passing to 'Cristiano Ronaldo dos Santos Aveiro'. This seems like a good post-match analysis tool. I got this idea after discussing with Sarath Babu.
  • Let us do the same analysis for Liverpool:
In [108]:
dis_Liv = nx.shortest_path(G_Liv_mod, weight = 'weight')
print(dis_Liv)
{'5': {'5': ['5'], '26': ['5', '26'], '7': ['5', '26', '7'], '14': ['5', '14'], '4': ['5', '4'], '11': ['5', '11'], '66': ['5', '26', '7', '66'], '9': ['5', '26', '9'], '1': ['5', '14', '1'], '6': ['5', '14', '6'], '19': ['5', '26', '7', '19']}, '26': {'26': ['26'], '5': ['26', '5'], '7': ['26', '7'], '14': ['26', '14'], '9': ['26', '9'], '4': ['26', '4'], '11': ['26', '9', '11'], '66': ['26', '7', '66'], '1': ['26', '14', '1'], '6': ['26', '14', '6'], '19': ['26', '7', '19']}, '7': {'7': ['7'], '26': ['7', '66', '5', '26'], '5': ['7', '66', '5'], '14': ['7', '14'], '9': ['7', '66', '9'], '4': ['7', '4'], '1': ['7', '1'], '11': ['7', '66', '11'], '66': ['7', '66'], '6': ['7', '14', '6'], '19': ['7', '19']}, '14': {'14': ['14'], '26': ['14', '5', '26'], '5': ['14', '5'], '7': ['14', '7'], '4': ['14', '4'], '1': ['14', '1'], '66': ['14', '7', '66'], '6': ['14', '6'], '19': ['14', '7', '19'], '11': ['14', '7', '66', '11'], '9': ['14', '5', '26', '9']}, '1': {'1': ['1'], '26': ['1', '26'], '14': ['1', '14'], '4': ['1', '6', '4'], '6': ['1', '6'], '7': ['1', '6', '7'], '11': ['1', '6', '66', '11'], '66': ['1', '6', '66'], '5': ['1', '6', '66', '5'], '9': ['1', '6', '66', '9'], '19': ['1', '6', '7', '19']}, '66': {'66': ['66'], '26': ['66', '5', '26'], '5': ['66', '5'], '14': ['66', '14'], '9': ['66', '9'], '11': ['66', '11'], '6': ['66', '14', '6'], '7': ['66', '14', '7'], '4': ['66', '5', '4'], '19': ['66', '11', '19'], '1': ['66', '14', '1']}, '4': {'4': ['4'], '26': ['4', '26'], '5': ['4', '26', '5'], '14': ['4', '26', '14'], '66': ['4', '6', '66'], '6': ['4', '6'], '7': ['4', '26', '7'], '9': ['4', '26', '9'], '1': ['4', '6', '1'], '11': ['4', '6', '66', '11'], '19': ['4', '26', '7', '19']}, '11': {'11': ['11'], '5': ['11', '66', '5'], '7': ['11', '9', '7'], '9': ['11', '9'], '4': ['11', '4'], '66': ['11', '66'], '19': ['11', '19'], '14': ['11', '9', '14'], '6': ['11', '9', '14', '6'], '26': ['11', '66', '5', '26'], '1': ['11', '9', '14', '1']}, '6': {'6': ['6'], '7': ['6', '7'], '14': ['6', '66', '14'], '4': ['6', '4'], '1': ['6', '1'], '11': ['6', '66', '11'], '66': ['6', '66'], '26': ['6', '4', '26'], '5': ['6', '66', '5'], '9': ['6', '66', '9'], '19': ['6', '7', '19']}, '9': {'9': ['9'], '7': ['9', '7'], '14': ['9', '14'], '11': ['9', '11'], '66': ['9', '11', '66'], '6': ['9', '14', '6'], '5': ['9', '14', '5'], '4': ['9', '14', '4'], '19': ['9', '7', '19'], '26': ['9', '14', '5', '26'], '1': ['9', '14', '1']}, '19': {'19': ['19'], '7': ['19', '7'], '14': ['19', '14'], '9': ['19', '9'], '11': ['19', '9', '11'], '66': ['19', '9', '11', '66'], '6': ['19', '14', '6'], '5': ['19', '14', '5'], '4': ['19', '14', '4'], '26': ['19', '14', '5', '26'], '1': ['19', '14', '1']}}
In [109]:
print(dis_Liv['1']['9'])
['1', '6', '66', '9']
  • Now we will calculate another important metric called eccentricity, which is based on shortest distance. Eccentricity of a player node p tells us how far the furthest player node from p is positioned in the pass network. Let us calculate the eccentricities for all the 11 nodes for Real Madrid.
In [110]:
E_Real = nx.eccentricity(G_Real_mod)
print(E_Real)
{'14': 2, '2': 2, '7': 2, '22': 2, '9': 2, '10': 2, '12': 2, '5': 2, '4': 2, '8': 1, '1': 2}
  • We can calculate the average eccentricity:
In [111]:
av_E_Real = sum(list(E_Real.values()))/len(E_Real)
print(av_E_Real)
1.9090909090909092
  • For Liverpool:
In [112]:
E_Liv = nx.eccentricity(G_Liv_mod)
print(E_Liv)
{'5': 2, '26': 2, '7': 1, '14': 2, '1': 2, '66': 2, '4': 2, '11': 2, '6': 2, '9': 2, '19': 2}
  • We can calculate the average eccentricity:
In [113]:
av_E_Liv = sum(list(E_Liv.values()))/len(E_Liv)
print(av_E_Liv)
1.9090909090909092
  • We can also calculate the average clustering coefficient of a graph. Let us first compute this metric for G_Real (note that this graph should not be the modified version)
In [114]:
cc_Real = nx.average_clustering(G_Real, weight = 'weight')
print(cc_Real)
0.182334851979709
  • for Liverpool:
In [115]:
cc_Liv = nx.average_clustering(G_Liv, weight = 'weight')
print(cc_Liv)
0.27664278424505534
  • The average clustering coefficient lies in the range [0, 1] where, a value of 0 denotes the fact that none of the nodes are connected to each other and a value of 1 denotes that the network is a clique, that is each node is connected to all the other nodes of the network. We see that interestingly the average clustering coefficient is lesser for Real Madrid's pass network stating the fact that a lesser number of players passed the ball among each other, compared to that of Liverpool.
  • Finally, we can compute the centrality (especially the betweenness centrality) for each node in either team's pass network and understand which player was the most important in their pass network. For Real Madrid:
In [116]:
bc_Real = nx.betweenness_centrality(G_Real, weight = 'weight')
print(bc_Real)
{'14': 0.15222222222222223, '2': 0.10685185185185186, '7': 0.05592592592592593, '22': 0.0, '9': 0.14462962962962964, '10': 0.12407407407407407, '12': 0.009259259259259259, '5': 0.007407407407407408, '4': 0.06851851851851852, '8': 0.031481481481481485, '1': 0.11703703703703704}
  • we can find the node which has the maximum betweenness centrality measure.
In [117]:
max_bc_Real = max(bc_Real, key = bc_Real.get)
print(max_bc_Real)
14
  • For Liverpool:
In [118]:
bc_Liv = nx.betweenness_centrality(G_Liv, weight = 'weight')
print(bc_Liv)
max_bc_Liv = max(bc_Liv, key = bc_Liv.get)
print(max_bc_Liv)
{'5': 0.06296296296296296, '26': 0.016666666666666666, '7': 0.2453703703703704, '14': 0.12407407407407407, '1': 0.002777777777777778, '66': 0.075, '4': 0.07222222222222222, '11': 0.05555555555555556, '6': 0.1259259259259259, '9': 0.021296296296296296, '19': 0.03888888888888889}
7
  • So we see that the betweenness centrality measure is max for 'Carlos Henrique Casimiro' (jersey: 4) from Real Madrid and 'James Philip Milner' (jersey: 7) from Liverpool. We have been able to compute some interesting results using complex network analysis on our pass networks. This completes my presentation. 😌😌😌😌😌😌😌😌😌

References¶

  • FCPython Blog,
  • Book Soccermatics by Dr. David Sumpter,
  • Friends of Tracking youtube channel managed by Dr. Sumpter,
  • Youtube channel by McKay Johns, and
  • Book Graph Theory and Complex Networks: An Introduction by Dr. Maarten van Steen

The End! Thank You! Wear Masks 😷😷, Get Vaccinated and Stay Stafe!¶